### Loading libraries
library(gapminder)
library(tidyverse)
library(dplyr)
library(forcats)
library(ggplot2)
library(plotly)
Task: Choose one dataset (of your choice) and a variable to explore. After ensuring the variable(s) you’re exploring are indeed factors, you should:
Drop factor / levels; Reorder levels based on knowledge from data.
Explore the effects of re-leveling a factor in a tibble by:
comparing the results of arrange on the original and re-leveled factor. Plotting a figure of before/after re-leveling the factor (make sure to assign the factor to an aesthetic of your choosing). These explorations should involve the data, the factor levels, and at least two figures (before and after.
First, let’s check if the variable of choice is a factor or not. #### Checking class
class(gapminder$country) #confirms that the country variable is a factor
## [1] "factor"
#levels(gapminder$country) : to see all the levels in the factor-country
nlevels(gapminder$country) # shows the number of countries in the dataset
## [1] 142
Now, let’s select all the countries in the continent of Asia and see how drop function can be useful. The former set of code evaluates without using drop function. When calculating number of countries in Asia we get the number of total countries in the database ie 142. The latter code uses drop function and gives the actual number of countries in Asia ie 33. This showcases the importance of using drop function while working with factors.
#Let's filter all the countries in the continent of Asia
Asia_country<- gapminder %>%
filter(continent == c("Asia"))
nlevels(Asia_country$country) # counting number of countries in Asia without using the drp
## [1] 142
Asia_country$country %>%
fct_drop() %>% #droplevels() is the base r equivalent and can also be used.
nlevels()
## [1] 33
To show how reordering can be helpful, let’s consider one particular year. For this purpose, 2007 has been chosen and the corresponding lollipop plot has been plotted.
q<-Asia_country %>%
filter(year=="2007") %>%
ggplot(aes(x = country, y = lifeExp))+
geom_segment(aes(x=country, xend=country, y=0, yend=lifeExp), color="skyblue")+
geom_point(color="blue", size=3, alpha=0.5)+
labs(
x = "Country",
y = "Life Expectancy")+
theme_bw()+
coord_flip()
q %>%
ggplotly()
p<-Asia_country %>%
filter(year=="2007") %>%
ggplot(aes(x = fct_reorder(country, lifeExp), y = lifeExp))+
geom_segment(aes(x = fct_reorder(country,lifeExp), xend = fct_reorder(country, lifeExp), y=0, yend=lifeExp), color="skyblue")+
geom_point(color="blue", size=3, alpha=0.5)+
theme_bw()+
labs(
x = "Country",
y = "Life Expectancy")+
coord_flip()
p %>%
ggplotly()
Task: Experiment with at least one of:
write_csv()/read_csv() (and/or TSV friends), saveRDS()/readRDS(), dput()/dget().
Further, the data is read back using here() function and explored using
gapminder_inclife<- gapminder %>%
group_by(country) %>%
arrange(year) %>%
mutate(incr_lifeexp = lifeExp-first(lifeExp)) %>%
arrange(country)%>%
filter(continent==c("Asia","Africa"))
DT::datatable(gapminder_inclife)
write.csv(gapminder_inclife, here::here("HW05","gapminder_inclife.csv"))
read.csv(here::here("HW05","gapminder_inclife.csv")) %>%
DT::datatable()
##fct_reorder2(gapminder_inclife$continent, gapminder_inclife$gdpPercap,min)
fct_count(fct_drop((gapminder_inclife$continent)))
## # A tibble: 2 x 2
## f n
## <fct> <int>
## 1 Africa 312
## 2 Asia 198
Same number of entries(510) and same variables can be observed in the initital tibble and the one after exporting to the system and re-reading it. ## Exercise 4
Task: Create a side-by-side plot and juxtapose your first attempt (show the original figure as-is) with a revised attempt after some time spent working on it and implementing principles of effective plotting principles. Comment and reflect on the differences.
For the assignment three, I had plotted GDP per capita vs life expectancy in Asia. Here is the before-visualization lecture:
gapminder %>%
filter(continent=='Asia') %>%
group_by(year) %>%
arrange(year) %>%
ggplot(aes(gdpPercap,lifeExp, size=pop/1000000))+
geom_point(alpha=0.7, shape=21, color="purple")+
labs(title="GDP per cap vs Life expectancy in Asia",
x="GDP per capita",
y="Life Expectancy")
Now, let’s see ‘after’ the visualization lecture:
p<-gapminder %>%
filter(continent=='Asia') %>%
group_by(year) %>%
arrange(year) %>%
ggplot(aes(gdpPercap,lifeExp, size=pop/1000000))+
geom_point(alpha=0.3, shape=21, color="purple")+
labs(title="GDP per cap vs Life expectancy in Asia",
x="GDP per capita",
y="Life Expectancy")
ggplotly(p)